A Neural Network Clustering Technique for Text-independent Speaker
نویسندگان
چکیده
A clustering algorithm for speaker identification based on neural networks is described. This technique is modeled after a previously developed technique in which an N-way speaker identification task is partitioned into N*(N-1)/2 two-way classification tasks. Each two-way classification task is performed using a small size neural network which is a two-way, or pair-wise, network. The decisions of these two-way networks are then combined to make the N-way speaker identification decision (Rudasi and Zahorian, 1991 and 1992). Although very accurate, this method has the drawback of requiring a very large number of pair-wise networks. In the new approach two-way neural network classifiers, each of which is trained only to separate two speakers, are also used to separate other pairs of speakers. Thus, in effect, speakers are clustered according to each pair-wise classifier. This method is able to greatly reduce the number of pair-wise classifiers required for making an N-way classification decision, especially when the number of speakers is very large. For 100 speakers extracted from TIMIT database, we were able to reduce the required number of pair-wise classifiers by a factor of 5, with no degradation in performance when 2 seconds or more of speech are used for identification. We obtained 100% text-independent speaker identification accuracy for 200 speakers with approximately 6 seconds of speech from each speaker and 97% when 2 seconds of speech were used.
منابع مشابه
Speaker-independent 3D face synthesis driven by speech and text
In this study, a complete system that generates visual speech by synthesizing 3D face points has been implemented. The estimated face points drive MPEG-4 facial animation. This system is speaker independent and can be driven by audio or both audio and text. The synthesis of visual speech was realized by a codebook-based technique, which is trained with audio-visual data from a speaker. An audio...
متن کاملIndependent Speaker Identi cation System Based on
In this paper, we describe a text independent, phoneme based speaker identiication system which uses adaptive wavelets to model the phonemes. This system identiies a speaker by modeling a very short segment of phonemes and then by clustering all the phonemes belonging to the same speaker into one class. The classiication is achieved by using a two layer feed forward neural network classiier. Th...
متن کاملNeural Network Based Missing Feature Method For Text-Independent Speaker Identification
The first step of missing feature methods in text-independent speaker identification is to identify highly corrupted spectrographic representation of speech as missing feature. Most mask estimation techniques rely on explicit estimation of the characteristics of the corrupting noise and usually fail to work with inaccurate estimation of noise. We present a mask estimation technique that uses ne...
متن کاملA Text Independent Speaker Recognition System Using a Novel Parametric Neural Network
This paper presents a new Speaker Recognition Technique aimed at high identification accuracy and low impostor acceptance. This method is based on a modified neural network, which is an extended and improved version of a Self-Organizing Map in multiple dimensions. The goal of this methodology is to achieve high accuracy identification and impostor rejection. The proposed method, Multiple Parame...
متن کاملA Chinese phoneme clustering theory and its application to a text independent speaker verification system
This paper presents a new idea of Chinese phoneme clustering and a text independent speaker verification system with this technique applied. It changes the way of conventional verification method with averaging features used, instead, both the dynamic and static features of speech are included in our new method. Also it leads to fast and efficient clustering algorithm in the training phase. The...
متن کامل